---
title: UnDatasIO
---

This notebook provides a quick overview for getting started with the __UnDatasIO document loader__. UnDatasIO enables efficient loading and parsing of various document formats including PDF, PNG, JPG, JPEG, and JFIF, with features like document lazy loading and native async support, all through UnDatasIO's secure cloud API. These capabilities make the processed data ready for generative AI workflows like RAG.

For detailed documentation on all features and configurations, refer to the official API reference.

## Overview

### Loader features

| Source | Document Lazy Loading | Native Async Support |
| :---: | :---: | :---: |
| UnDatasIOLoader | ✅ | ✅ |

## Setup

### Credentials

UnDatasIO requires an API token.  
Generate a free token at [undatas.io](https://undatas.io) and set it in the cell below:

```python
import getpass
import os

if "UNDATASIO_TOKEN" not in os.environ:
    os.environ["UNDATASIO_TOKEN"] = getpass.getpass(
        "Enter your UnDatasIO API token: "
    )
```

### Installation

#### Normal Installation

The following packages are required to run the rest of this notebook.

```python
# Install package, compatible with API partitioning
%pip install langchain-undatasio
```

### Initialization

The __UnDatasIOLoader__ supports single-file upload & parsing via the UnDatasIO cloud API.

```python
from langchain_undatasio import UnDatasIOLoader

loader = UnDatasIOLoader(
    token=os.environ["UNDATASIO_TOKEN"],
    file_path="demo.pdf"
)
```

### Load

```python
docs = loader.load()
docs[0]
```

```output
Document(
    metadata={'source': 'demo.pdf', 'task_id': 't1', 'file_id': 'f1'},
    page_content='Growing a Tail: Increasing Output Diversity in Large Language Models\n\nAuthors: Michal Shur-Ofry1, Bar Horowitz-Amsalem1†, Adir Rahamim2, Yonatan Belinkov2*\n\nAffiliations:\n\n1Law Faculty, Hebrew University of Jerusalem; Jerusalem, Israel.\n\n2Faculty of Computer Science, Technion – I'
)
```

```python
print(docs[0].page_content[:300])
```

```output
Growing a Tail: Increasing Output Diversity in Large Language Models

Authors: Michal Shur-Ofry1, Bar Horowitz-Amsalem1†, Adir Rahamim2, Yonatan Belinkov2*

Affiliations:

1Law Faculty, Hebrew University of Jerusalem; Jerusalem, Israel.

2Faculty of Computer Science, Technion – I
```

### Lazy Load

__UnDatasIOLoader__ supports lazy loading for memory-efficient iteration.

```python
pages = []
for doc in loader.lazy_load():
    pages.append(doc)

pages[0]
```

```output
Document(
    metadata={'source': 'demo.pdf', 'task_id': 't1', 'file_id': 'f1'},
    page_content='Growing a Tail: Increasing Output Diversity in Large Language Models\n\nAuthors: Michal Shur-Ofry1, Bar Horowitz-Amsalem1†, Adir Rahamim2, Yonatan Belinkov2*\n\nAffiliations:\n\n1Law Faculty, Hebrew University of Jerusalem; Jerusalem, Israel.\n\n2Faculty of Computer Science, Technion – I'
)
```

## See Also

- [UnDatasIO](https://undatas.io)
- [langchain-undatasio](https://pypi.org/project/langchain-undatasio/)
